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ABSTRACT 


System dynamics modelers have often been criticized for their informal 
methods of model validation and for not using more formal, quantifiable 
measures to lend confidence to the validation process. Numerous 
proponents of the system dynamics approach have highlighted this 
shortcoming, however, and have suggested a variety of appropriate statistical 
measures which could be used in the model validation process. 

The objective of this thesis is to complement earlier validation efforts of 
the Abdel-Hamid and Madnick System Dynamics Model of Software 
Development by submitting the model to a battery of appropriate statistical 
measures. The model is evaluated with statistics which have been used by 
others in the system dynamics field. The evaluation makes two different 
comparisons. First, an evaluative comparison is made between data 
generated by the model and actual data of two real software projects. Then, an 
evaluative comparison is made between model generated data and data 
obtained by direct experimentation for two different experiments, using the 
model's gaming interface. The two evaluations serve to promote confidence 


in the model. 
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I. INTRODUCTION 


A. BACKGROUND 

System Dynamics modelers have been criticized for their qualitative, 
informal methods of model evaluation and for not utilizing quantitative, 
objective measures of model validation. As stated by Sterman "...the validity 
of system dynamics models is often questioned even when their 
correspondence to historical behavior is quite good...the failure to present 
formal analysis of historical behavior creates an impression of sloppiness and 
unprofessionalism.” (Sterman, 1984, p. 51) Numerous proponents of the 
system dynamics approach have highlighted this shortcoming, however, and 
many have suggested various means to tackle the problem (Barlas 1989, 
Forrester and Senge 1980, Naylor 1971, Rowland 1978, Sterman 1984). 

There is, however, an even more basic issue that warrants discussion 
before specifically addressing the problem of validation. The issue, as 
discussed by Barlas and Carpenter (1990), is a result of two differing 
philosophies of science, the traditional logical empiricist philosophy and the 
more recent relativist philosophy. Where the logical empiricist "...assumes 
that knowledge is an objective representation of reality and that theory 
justification can be an objective formal process.” (Barlas an Carpenter, 1990, 
p. 148) And the relativist advocates "...knowledge is relative to a given 
society, epoch, and scientific world view. Theory justification is therefore a 
semiformal, relative social process." (Barlas and Carpenter, 1990, p. 148) The 


authors argue that the relativist philosophy is the applicable philosophy to 


hold for the system dynamics methodology in the context of model 
validation. The relativist philosophy has a certain appeal, in that the 
empiricist would espouse a given model to be an objective, absolute 
representation of reality and as such, the model could be empirically 
evaluated as being true (or false) (Barlas and Carpenter 1990). The relativist 
would view a given model as only one of many ways to portray reality, with 
no model being able to claim absolute objectivity, although one model may be 
more effective than another (Barlas and Carpenter 1990). Those who are 
familiar with and use the system dynamics methodology could equate easily 
to the relativist viewpoint. 

The validation of a system dynamics model is, thus, not a simple matter 
of subjecting a model to some standard set of classic statistical tests. As 
pointed out by Barlas "System Dynamics models have certain characteristics 
that render standard statistical tests inappropriate." (Barlas, 1989, p. 59) This 
does not mean that the validation process for a system dynamics model 
should be solely qualitative. It means that a system dynamics modeler needs 
to employ tests, both quantitative and qualitative, that can serve to evaluate a 
given model. 

As stated by Forrester and Senge, “There is no single test which serves to 
‘validate’ a system dynamics model. Rather, confidence in a system dynamics 
' model accumulates gradually as the model passes more tests and as new 
points of correspondence between the model and empirical reality are 
identified." (Forrester and Senge, 1980, p. 209) This point is emphasized by 
many in discussions of model validation (see for example Barlas and 


Carpenter 1990, Richardson and Pugh 1981, Sterman 1984). The consensus is 


that validating system dynamics models, should imply a continuous cycle of 
confidence building tests throughout the iterative development of a model. 
In essence, the utility of a simulation model depends upon the confidence 
that the model users have in the model. Each test should not serve as an end 
in itself, but merely as one of many steps which serve to build that 
confidence. 

Richardson and Pugh, address the issue of model validity in several 
different perspectives. The first of those issues involves validity and model 
purpose "...it is meaningless to try to judge validity in the absence of a clear 
view of model purpose." (Richardson and Pugh, 1981, p. 310) Richardson and 
Pugh also discuss model validity in terms of a model's suitability and 
consistency. In doing so they pose two questions: "Is the model suitable for its 
purposes and the problem it addresses?" and "Is the model consistent with 
the slice of reality it tries to capture?” (Richardson and Pugh, 1981, p. 312) 
Since no model can claim absolute truth, the best that can be hoped for is that 


the model be suitable for its purpose and consistent with reality. 


B. PURPOSE OF THESIS RESEARCH 

The focus of this thesis is the evaluation of the ability of the software 
development system dynamics model developed by Abdel-Hamid and 
Madnick (1991) to satisfactorily match the historical data of the system it was 
designed to model. Sterman (1984) described the evaluation of a model's 
historical fit as a weak test by itself, while noting that "Failure to satisfy a 
client or reviewer that a model's historical fit is satisfactory is often sufficient 


grounds to dismiss the model and its conclusions.” (Sterman, 1984, p. 52) 


Generally speaking, the historical fit of a model is adequate for the model's 
purpose. The problem stems from the manner in which historical fit has 
been presented. Typically, when system dynamicists consider the matter of 
historical fit, they simply display a graph of the model's data against the actual 
historical data. The reader is then left to decide if he agrees or disagrees with 


the modeler's opinion of adequate fit. 


A more formal measure of goodness-of-fit would be more appropriate 
for the system dynamics model validation process. Several statements of 
Sterman are worthy of note in regard to the evaluation system dynamics 
models. "A good system dynamics model is expected to generate the 
historical behavior of the system endogenously, and without extensive 
use of exogenous or dummy variables. Historical data should not be 
used to estimate the parameters of a model directly...system dynamics 
models do not usually employ formal estimation procedures that 
guarantee a minimum sum-of-squared-errors over the range of available 
data as in regression. As a result, the error between simulated and actual 
data may be larger than typically found in regression models...because 
exogenous and dummy variable are not used and the historical data are 
not used to derive parameters that minimize some measure of error, 
larger errors than are typical in regression models do not necessarily 
compromise the validity of system dynamics models or imply lack of 
confidence in their results." (Sterman, 1984, p. 53) 


C SCOPE AND NATURE OF THESIS RESEARCH 

The question then becomes which quantitative test(s) should be used to 
evaluate historical fit. This has been addressed by several within the system 
dynamics field (see for example Barlas 1989, Forrester and Senge 1980, 
Rowland and Holmes 1978, Sterman 1984). This thesis will rely upon the 
methods utilized by Sterman (MSE, RMSPE, and Theil statistics) and Theil's 
inequality coefficient (Theil 1961 and 1966). Theil's inequality coefficient has 


become a standard validation tool for economists and students of social 


systems (Rowland and Holmes 1978, Senge 1973). These tests will be applied 
to the Abdel-Hamid and Madnick model (Abdel-Hamid and Madnick 1991) in 
four different cases which compare actual output to the model's output. A 
description of each test follows: 
1. Mean-Square-Error (MSE) Test 
The mean-square-error (MSE), a measure of forecast error, is defined 


as: 
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where 
n = Number of observations (¢ = 1, ..., 1) 
S; = Simulated value at time ft 
A; = Actual value at time t 


The MSE measures the deviation of the simulated variable from the actual 

value over a given time period. The advantages of this measure are that 

large errors are weighted more heavily than small ones and that errors of 

opposite sign do not cancel each other out (Sterman 1984). By taking the 

square root of the MSE, the forecast error can be put into the same units as the 

variable in question. This measure is referred to as the root-mean-square 
(RMS) simulation error (Pindyck and Rubenfield 1991). 
2. Root-Mean-Square Percent Error (RMSPE) Test 

A more convenient measure of forecast error is the root-mean-square 

percent error (RMSPE), which provides a normalized version of the error and 


is defined as: 





This also measures the deviation of the simulated variable from the actual 
value over a given time period, but puts it into percentage terms (Pindyck 
and Rubenfield 1991). 
3. Theil Statistics Test 

The MSE and the RMSPE measure the size of the total error between 
the actual and the simulated data. The MSE can also be decomposed into the 
Theil statistics (Sterman 1984, Pindyck and Rubenfield 1991) to assist in 
revealing the sources of the error. The sources of error are given in terms of 
bias, variance, and covariance. The decomposition of the MSE into the Theil 


statistics is as follows: 
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r is the correlation coefficient between simulated and actual data: 
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The proportions UM, US, and UC represent the amount of error in the MSE 
due to bias, variance, and covariance respectively. Note also, that UM + US + 
UC = 1 as the sum of the three represents the total MSE. 

Bias (UM) measures the degree to which the average values of the 
simulated and actual values differ. In conventional statistical terms, an 
estimate is biased if estimates are made repeatedly and the mean for those 
estimates does not approach the actual value of the parameter, as the number 
of estimates grows (Bush and Mosteller 1955, p. 199). Therefore it is more 


appealing that a model's estimates be unbiased, that is, the expected value of 


the estimator approaches the population value, as the number of sample 
estimates increases. Large bias (indicated by large UM and a large MSE) is an 
indicator of systematic error between the model and reality and could be 
potentially troubling. Systematic error may indicate that there is some 
variable or parameter in the real system which is not reflected correctly in the 
model. It is unlikely that a model which adequately reflects reality would 
produce these results. Bias errors could indicate specification of parameter 
errors within the model. On the other hand, not all bias errors are 
detrimental to a model. This could be the case if UM is large but the size of 
the error itself is small (small MSE/RMSPE) or there are acceptable 
simplifying assumptions present. As stated previously, if an error is 
systematic, even if it is large, it may still be acceptable provided that it does 
not compromise the purpose of the model. "In terms of testing the validity of 
a model...a model should have predictive power, it should be able to 
forecast...the degree of precision being sufficient if increased accuracy did not 
lead to different conclusions." (Bloomfield 1986, p. 94) If a closer goodness-of- 
fit does not serve to provide the user of the model with a clearer 
understanding of the software development process, then confidence should 
not be adversely affected. It may still be prudent, however, for the modeler to 
re-examine the parameters impacting that variable. 

The variance proportion (U5) measures how well the model's 
estimate matches the degree of variability in the actual value. For instance, a 
large US suggests that the simulated series has fluctuated considerably while 
the actual series has fluctuated very little, or vice versa. A large variance 


proportion may also be an indicator of a systematic error. 


The covariance proportion (U©) measures the unsystematic error (the 
error remaining after deviations from average and average variabilities have 
been accounted for). This portion of error is the least troublesome of the 
three. Unsystematic error suggests that an exogenous event influenced the 
system behavior. The presence of unsystematic error does not compromise a 
model's ability to suit its purpose, as it is not within a model's scope to 
forecast based on random external noise. To do so could defeat the purpose 
for which a model is intended. 

4, Theil Inequality Coefficient Test 

The final test which will be employed is the Theil. Inequality 

Coefficient (Rowland and Holmes 1978, Theil 1961, 1966). The inequality 
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The inequality coefficient (U) will always be between 0 (perfect predictions) 


coefficient is defined as: 


and 1 (worst predictions). 

Of course, with these tools in hand, one must then ask what defines 
an acceptable level of goodness-of-fit in order to instill confidence in the 
model. Research into this area of study has shown that within the software 
development field, there is no standard of acceptable tolerance that a model of 
this nature should adhere to, for it to be deemed "valid" or acceptable. In 
general, however, these tests can effectively build confidence in a model 


(Barlas 1989, Rowland and Holmes 1978, Sterman 1984) if: 


(1) Errors are small (RMSPE less than 10%) and unsystematic 
(concentrated in U5 and UC). (Sterman 1984) An RMSPE of 10% is used 
as the guideline for an acceptable tolerance level in this study and is 
derived from two sources. The first is Sterman (1984, p. 56) "The RMS 
percent errors are below ten percent...While the small total errors in 
most variables show the model tracks the major variables, the several 
large errors might raise questions about the internal consistency of the 
model or the structure controlling those variables." While not 
explicitly stated, an acceptable error tolerance level of 10% is implied 
within his analysis. The other basis is from Veit (1976 p. 540) 
"Generally speaking, if the model can reproduce the historical values 
of key variables within 10% then the structure of the model is probably 
sound. In other words, all of the variables and sectors are linked 
together in such a way that the model is a fair representation of the real 
world...If the structure of the model is correct, it will vary the values of 
the variables at variable rates over time in such a way that they 
reproduce historical data fairly closely." 


(2) Large errors, but due to excluded modes, simplifying assumptions, or 
noise in historical data, such that the nature of the error does not 
adversely impact the model's purpose. (Sterman 1984) 


(3) The Theil Inequality coefficient is less than 0.4, "...one may arbitrarily 
identify TIC values above 0.7 as corresponding to rather poor models, 
TIC values between 0.4 and 0.7 for average-to-good models, and TIC 
values below 0.4 as very good or excellent models." (Rowland and 
Holmes 1978, p. 40) 


The Quattro Pro 3 spreadsheet application, by Borland International, was used 
to compute the statistics. A representative spreadsheet layout for each 
formula presented and analyzed is given in Appendix A. This analysis will 


use these statistics to form the basis of the model evaluation. 
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Il. THE ABDEL-HAMID MADNICK SYSTEM DYNAMICS MODEL OF 
SOFTWARE DEVELOPMENT 


A. MODEL PURPOSE 

The software systems development model by Abdel-Hamid and Madnick, 
is based on the feedback principles of system dynamics (Abdel-Hamid and 
Madnick 1991). The purpose of the model is to serve as a vehicle which 
"..enhances our understanding of, provides insight into, and makes 
predictions about the process by which software development is 
managed...intended to provide a general understanding of the nature of the 
dynamic behavior of a project (e.g., how work force level and productivity 
change over time and why) rather than to provide point-predictions (e.g., 
exactly how many errors will be generated.)" (Abdel-Hamid and Madnick, 
1989, pp. 1426-1437). Through this model, the developers have endeavored to 
provide a means by which managers and researchers, can gain a better 
understanding of the managerial side of the software development process. 
This has proven to be a complicated process, which is yet to be fully 
understood or comprehended, by both academia and management 
professionals. 

For this model to accomplish its purpose, it must reasonably portray a 
given software development project as it would actually unfold under given 
management policy decisions and situations. Users of the model must also 
have an acceptable degree of confidence in the model's forecasting ability. 


However, the model's purpose is not to make point predictions or to derive 


11 


an optimal solution to a given situation. Rather, it is to gain understanding 
and insight into the complex process of managing software projects. 

The engineering functions of software development have experienced 
Significant advances in recent years. Improvements in areas such as 
structured programming, structured design, formal verification, language 
design for more reliable coding, and diagnostic compilers continue to be 
introduced to the field (Abdel-Hamid and Madnick 1989). In contrast, the 
managerial side of software development has received relatively little 
attention from researchers (Abdel-Hamid and Madnick 1989). This dearth of 
research may certainly be a contributing factor to the managerial problems 
which characterize the software industry today. As stated by Brenton R. 
Schlender "...software remains the most complex and abstract activity man 
has yet contrived." (Schlender 1989, p. 112) This model also serves to broaden 
the range and scope of research which has been conducted in the somewhat 


brief history of software development. 


B. MODEL DEVELOPMENT AND STRUCTURE 

The model was developed from an extensive field study of software 
project managers in five organizations. The study consisted of three 
information gathering steps (Abdel-Hamid and Madnick 1989). The first step 
involved a series of interviews with software project managers at three 
organizations. From the information gathered in this phase and from the 
modelers' own experience in software development, a skeleton of a system 
dynamics software development model was established. The next step was an 


extensive literature review, which served to fill many knowledge gaps and 
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resulted in a more detailed model. The final step was another round of 
intensive interviews with software project managers at three organizations. 
In this round of interviews, only one of the three project managers was from 
the initial interview group. 

From these three steps, a highly detailed, quantitative simulation model 
was developed which integrates managerial activities (e.g., planning, 
controlling, and staffing) with software production type activities (e.g., design, 
coding, reviewing, and testing). The model contains over one hundred 
causal links and four major subsystems (human resource management, 
software production, control, and planning). It has been designed for use on 
medium sized, organic type software projects (i.e., projects that are 10,000 to 
250,000 lines of code and conducted in familiar in-house environments). For 
a detailed discussion of the model's actual structure and formulation, see 


Abdel-Hamid and Madnick (1989 and 1991). 


ks 


Iii. ANALYSIS OF DE-A AND DE-B PROJECTS 


A. DESCRIPTION OF PROJECTS 

One of the initial model validation efforts for the Abdel-Hamid and 
Madnick model (Abdel-Hamid Nov. 1990, Abdel-Hamid and Madnick 1989) 
involved a case study at NASA’s Goddard Space Flight Center (NASA was 
not among the five organizations studied during model development) 
(Abdel-Hamid Nov. 1990). The case study involved the simulation of two 
separate software projects at NASA, the DE-A and DE-B projects. The 
validation procedure used a graphical comparison of actual data against the 
model's data. Both projects were designed for the purpose of designing, 
implementing, and testing software systems for processing telemetry data and 
providing attitude determination and control for NASA's DE-A and DE-B 
satellites. The development and target operations machines for both projects 
were the IBM S/360-95 and-75, and the programming language was 


FORTRAN. Initial project estimates and actual results are given in Table 3-1. 


TABLE 3-1 
DEA-A Initial Estimates Actuals 


DEA-B 
Cost 
Schedule 


(Note: DSI = Delivered Source Instructions) 
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B. DE-A AND DE-B PROJECT VARIABLES 

The analysis of the DE-A and DE-B projects involves a comparison of 
three variables (SCHEDULE estimate, WORKFORCE size, and cost in MAN- 
DAYS) in terms of actual project results versus the model's results. The 
variable comparisons are made at different time intervals throughout the 
projects’ lifecycles. The reason for comparison at different time intervals vice 
comparing just the final outcome, is that the model's purpose is to gain an 
understanding of the entire software development process, not just the final 
result. 

The SCHEDULE variable is an estimate of how long it nail take to 
complete the project from start to finish. For example, on day 40 after the 
project had commenced, the project managers estimated that the project 
would be complete on the 320th day of elapsed time, whereas on day 280, they 
had revised the completion day to the 330th day. Thus, the analysis of the 
SCHEDULE variable is a comparison of the project managers’ actual 
estimated schedule completion time versus the model's estimated schedule 
completion time. The WORKFORCE variable represents the desired staffing 
level at a given time (comparison of the actual number staff desired vs. 
model generated). The MAN-DAYS variable is a measure of the project's 
accumulated cost (in man-days) at a given time (comparison of the actual cost 
vs. model generated). 

1. DE-A Project Results and Analysis 

The input data tables used to calculate the statistics for the actual 


results and the model results for each of the variables is given in Appendix B. 
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Table 3-2 provides the RMSPE, MSE, Theil's Inequality Statistics, and Theil's 
Inequality Coefficient for each of the DE-A project variables. 


TABLE 3-2. ERROR ANALYSIS OF DE-A PROJECT 












INEQUALITY STATISTICS 
SCHEDULE | 98 | 106 {| or | 28 | 7100 
MAN-DAYS | 93 | 22178 | 04 | 12 |B 


As can be seen from Table 3-2, SCHEDULE and MAN-DAYS have an RMSPE 










below 10%, while WORKFORCE is above the 10% level. All three variables 
have a TIC value well below the .40 level. 

The SCHEDULE variable shows an extremely low RMSPE (.98%), 
indicating that the difference between the actual results and the model results 
is very small. This indicates that the model matched very well with the 
actual schedule estimates made by the project managers. On average, the 
model differed from the actual estimates by only three days (square root of the 
MSE). The decomposition of the MSE into the inequality statistics reveals 
that the source of the small error was unequal covariance (unsystematic 
error). As such, the nature of the error is not a major concern since the 
model's purpose is not point prediction. The two series are plotted in Figure 
3-1. | 

The MAN-DAYS variable shows a 9.3% difference, on average, 


between the actual cost and the model's forecasted cost over the project's 
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Figure 3-1. DE-A SCHEDULE Actual vs. Model 


duration. In absolute terms, this equates to an average difference between the 
model cost and actual cost of 149 man-days (square-root of the MSE). This of 
course, is well below the 10% error tolerance level and suggests that structure 
of the model is sound. The inequality statistics suggest that the majority of 
the error is unsystematic (e.g., 84% of the error due to covariance), which is 
quite acceptable. Additionally, the simulated cost trend matches the actual 
cost trend quite well. This can be seen graphically as well in Figure 3-2, where 
the point by point differences are obvious, but the general slopes appear to be 


very close. 
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Figure 3-2. DE-A MAN-DAYS Actual vs. Model 


The WORKFORCE variable displays the largest RMSPE at 17.6%. In 
actual terms, the model shows an average difference from the actual 
workforce size of .95 people over the course of the project's life. The 
inequality statistics do not indicate that the error is concentrated in any one 
source. Rather, the error is evenly distributed between the three sources. 
While the majority of error is in unequal covariance (41%) or unsystematic in 
nature, it does not dominate. That 29% and 30% of the error is due to bias 
and variance, respectively, could be of concern because both are potential 


indicators of systematic error. This could potentially compromise the 


18 


model's usefulness. In this case however, the trend of the model matches 
that of the actual data very closely and the difference between the average 
values of the two series (the error due to bias), is small enough as to not 
adversely impact the purpose of the model. The reasoning behind this is that 
the purpose of the model is to provide insight into the dynamic behavior of a 
project, not point prediction. Any adjustment of the model's parameters to 
make a closer fit, would not necessarily increase ones ability to glean further 
insight or understanding (Bloomfield 1986). Therefore, the error in this case 
is unsystematic with respect to the purpose of the model. Plotting the model 
results vs. the actual results (Figure 3-3), shows the small differences in the 
point by point match and highlights the very similar trend pattern of each 
series. 
2. DE-B Project Results and Analysis 

The input data tables for the actual results and the model results for 
each of the variables is given in Appendix C. Table 3-3 provides the 
calculated RMSPE, MSE, Theil's Inequality Statistics, and Theil’s Inequality 


Coefficient for each data set of the DE-B project variables being analyzed. 
TABLE 3-3. ERROR ANALYSIS OF DE-B PROJECT 
eee 
SCHEDULE | 25 | 643 | 68 | 02 | 30 
MAN-DAYS sn ae rl an compete St 
WORKFORCE] 11.0 | 10 | 17 | 0 | 22 | 


As can be seen from Table 3-3, SCHEDULE and MAN-DAYS have an RMSPE 










| Variable | RMSPE(%) | __MSE__ 






well below 10%, while WORKFORCE is above the 10% level, as was the case 
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with the DE-A project. Additionally, all three variables have a TIC value well 
below the .40 level. Each variable will be discussed separately and analyzed 
using the inequality statistics. 


O 
> 
® 

a 
» 
0 
, 
O 

Le 

x 
(ae: 
O 

> 


120 160 200 240 280 320 360 400 
Time (Days) 


| Mode —t— Actua! | 


Figure 3-3. DE-A WORKFORCE Actual vs. Model 





SCHEDULE shows a very low RMSPE indicating that the magnitude 
of the error is very small and that the model matched the real system quite 
well. The inequality statistics reveal that the major source of the error can be 
attributed to bias, or possibly a systematic difference between the model and 
reality, which is a potential problem. The graph of the two series (Figure 3-4) 


shows that the project managers did not adjust their schedule estimates until 
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day 260. According to DeMarco (1982) this is typical "Once an original 
estimate is made, it's all too tempting to pass up subsequent opportunities to 


estimate by simply sticking with your previous numbers. This often happens 
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Figure 3-4. DE-B SCHEDULE Actual vs. Model 


even when you know your old estimates are substantially off. There are a few 
possible explanations for this effect: It's too early to show slip...If I re-estimate 
now, I risk having to do it again later (and looking bad twice)...As you can see, 


all such reasons are political in nature." The model does in fact take this 


mie 


system component into account and the small error size is evidence of its 
presence. The model does capture the major portion of this component. The 
simplifying assumptions in this regard do not jeopardize the model's 
purpose, as it does not degrade any general understanding of the nature of the 
SCHEDULE estimates within the system. 

MAN-DAYS, like SCHEDULE has a very low RMSPE (2.8%), 
indicating that the magnitude of this error is also very small and that it also 
approximates reality quite well. Additionally, the inequality statistics show 
that the preponderance of the error is concentrated in unequal covariance 
(64%) and variance (28%). The small and unsystematic error does not in any 
way detract from the model's ability to serve its purpose. Specifically, the 
small impact of outside noise does not affect a user's ability to gain insight 
into the cost structure, reflected by the model's MAN-DAYS variable. A plot 
of the model vs. the actual cost (Figure 3-5) helps to illuminate the model's 
ability to match reality for this variable. 

As with the DE-A project, WORKFORCE displays the highest RMSPE 
(11%) of the three variables for the DE-B project. Although in this project the 
error is not as great as in the DE-A project. The source of the error is 
concentrated mainly in the unequal covariance proportion (72%) and is an 
unsystematic type of error. Once again, the model captures the general trend 
of the real system, even though it varies on a point by point basis (see Figure 
3-6). This does not detract in any way from the model's ability to demonstrate 


the dynamic nature of the work force structure during the project's lifecycle. 


Nh 
2 
O 
© 


“a> 
~h 
> 
Oo 

i 
Cc 
o 

= 

See? 

ape 
e 
O 
© 


ml —_ wth 
- Or cu 
© © O 
© © a 


120 160 200 240 280 320 
Time (Days) 


—=- Model —+*— Actua! 


Figure 3-5. DE-B MAN-DAYS Actual vs. Model 
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Figure 3-6. DE-B WORKFORCE Actual vs. Model 


24 


C CONCLUSIONS 

The final statistical measure which remains to be addressed for both 
projects is the TIC measure. As can be seen from Tables 3-2 and 3-3, the TIC 
values for both projects are below .10. All of these values are well below the 
general guidelines given by Rowland and Holmes, where TIC values below .4 
would equate to a very good or excellent model "...one may arbitrarily identify 
TIC values above 0.7 as corresponding to rather poor models, TIC values 
between 0.4 and 0.7 for average-to-good models, and TIC values below 0.4 as 
very good or excellent models." (Rowland and Holmes 1978, p. 40) Thus, the 
analysis of TIC values for all variables, indicates that the model performs 
extremely well at portraying the reality of the given software development 
projects. None of the tests which were conducted were able to detract from 
the model's ability to suit its purpose, as all errors were either 1) small 
(RMSPE under 10%) and unsystematic or 2) large errors, but unsystematic and 
3) all TIC values were well below .40. Consequently, the foregoing tests 
should serve only to build confidence in the model's utility towards 


understanding the software development process. 


ZS 


IV. ANALYSIS OF RONAN'S AND BAKER'S EXPERIMENTS 


A. THE DIRECT EXPERIMENTATION METHOD 

Another method of comparison which can be used to test a model's 
goodness-of-fit, besides that of direct comparison with actual historical data, is 
direct experimentation. Direct experimentation uses an interactive game 
based on the model being tested. Subjects in the game assume a given role in 
the system which has been modeled and are required to make a specific 
decision(s). The subjects are placed in the same decision making setting 
assumed in the model, they receive the same information set as the model, 
and try to meet the same goals as the model. The subjects are then free to 
make their decision in any way that they want. Of course, the decision of the 
model is based on the explicit rule set contained in the model's structure. It is 
then possible to compare the decision(s) made by the subjects in the 
experiment to the decision(s) made by the model. This comparison can be 
used to confirm or disconfirm the decision rule contained in the model and 
thus, promote confidence in the model. (Sterman 1987) 

The goodness-of-fit tests used in the previous chapter measured the 
ability of the model to capture three elements of the reality of the system that 
the model was designed to capture. Direct experimentation provides another 
measurement of goodness-of-fit, from a somewhat different perspective. The 
direct experimentation method is a comparison of the subjects’ behavior with 
that of the model's, for given variables or decision rules, within the same 


environment. The assumptions underlying the model environment must 
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also exist in the subjects experimentation environment. Direct 
experimentation will not reveal if those assumptions are incorrect. It can, 
however, be used to promote confidence in the model by showing that 
"... given the institutional structure people behave the same way the model 
presumes them to behave." (Sterman 1987, p. 1577) Therefore, direct 
experimentation can serve as a useful tool for examining the accuracy of a 
given decision rule, for a given variable's output. 

The same statistics used to evaluate goodness-of-fit for the DE-A and DE-B 
projects (RMSPE, MSE, Theil Inequality Statistics, and TIC), will be used in the 
following direct experimentation comparisons. Additionally, an alternative 
method of analyzing the model's ability to match that of the subjects is 
proposed. This method is based on the work done by Sterman (1987 and 1989) 
and will be introduced and applied to a subset of Baker's Experiment (Baker 
1992). The main focus of the analysis, however, will remain on the 


previously defined statistics. The proposed computation is as follows: 


» (A - S)° 


St 


where 
S; = Simulated (model) value at time ft 


A; = Actual (experimental subjects) value at time f¢ 
The purpose of the proposed measure is to examine the computed value at 
each time t and to analyze the nature of the changes in the computed value 


over the entire lifecycle of the project. 
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B. RONAN'S EXPERIMENT 


1. Experiment Description 

The basic design of Ronan's experiment ("Experiment two") was to 
use graduate students as surrogate software project managers for decision 
making purposes (Ronan 1990). The subjects utilized Abdel-Hamid's and 
Madnick's System Dynamics Model of Software Project Management (SDM) 
gaming interface, to input decisions and to provide feedback at each of the 
decision making intervals (once every 20 days). The experiment was designed 
to create identical SDM projects which differed only by the initial man-day 
cost estimate. The initial constraints of the software project the subjects 
worked with were based on the DE-A project. The project variables within 
the SDM were identical, with the exception of the initial man-day cost. The 
subjects’ decided upon the desired staffing level for the remainder of the 
project at each interval (based on information generated by the SDM gaming 
interface). Their goal was to decide on the staffing level which they felt 
would allow the project to finish on an acceptable schedule and while 
avoiding excessive cost Overrun. 

Ronan's objective was to compare the desired staffing level decisions 
of software project managers managing identical projects throughout the 
development phase. The only difference was that their man-day cost was 
initially under-estimated, over-estimated, or perfectly estimated. The subjects 
were divided into four groups, with the 8 or 9 students in each group 
designated by a "G-number". The group with the perfectly estimated initial 


cost was designated "G-1900" for an initial estimate of 1900 man-days. Two 
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groups received over-estimated initial costs, "G-2185" and "G-2470". The 
under-estimated group was "G-1460". 

Each subject, within each group made his own staffing decision based 
on the initial conditions and the subsequent information provided by the 
SDM gaming interface. A group desired staffing level, for each of the four 
groups at a given interval, was computed based on the combined results of 
each subject within a group. 

The same initial conditions which were provided to the subjects for 
decision making, were than input into the SDM to compare the model's work 
force level decisions, with each of the four student groups. 

2. Ronan Experiment Results and Analysis 

The input data tables for the actual results and the model results for 
each of the variables is given in Appendix D. Table 4-1 provides the 
calculated summary statistics (RMSPE, MSE, Theil's Inequality Statistics, and 


TIC) for each data set of Ronan's Experiment. 


TABLE 4-1. ERROR ANALYSIS OF RONAN'S EXPERIMENT 
(WORKFORCE LEVEL) 


<a STATISTICS 





As can be seen from Table 4-1, the TIC values for each of the groups are well 


below the .40 level, suggesting that the model does an excellent job of 


Va 


matching the subjects decisions. The RMSPE ranges from a low of 11% to a 
high of 21.5%. This error range of the WORKFORCE variable in Ronan's 
experiment, is not unlike the error range exhibited in the DE-A and DE-B 
projects for this same variable (17.6% and 11% respectively). The subjects’ 
actual values versus the model values are plotted in Figures 4-1, 4-2, 4-3, and 
4-4 for G-1460, G-1900, G-2185, and G-2470 respectively. Obviously, the 
RMSPE values all exceed the 10% level and merit further analysis. 

In general, the inequality statistics do not demonstrate the presence of 
clearly unsystematic error. Although, for the group with the initially 
underestimated cost (G-1460), the majority of the error is concentrated in 
covariance, which does indicate unsystematic error. For the remaining three 
groups however, much of the error is concentrated in the bias proportion. 
This could be an indicator of systematic error between the model and the 
experimental groups. A large, systematic error could be potentially 
troublesome, as it would limit the model's usefulness as a research and 
education tool. Or in the least, lead to questions of its usefulness. 

One possible explanation for the existence of bias between the model 
and the student subjects, lies in the difference between the subjects’ 
environment and the model environment. The experiment strived to place 
the subjects in the same environmental context which the model is based on. 
In contrast, the model is not designed to mimic the environment which the 
students are in. Therefore, it is possible for there to be various 
environmental factors which affect students’ decisions, but are not reflected 


in the model's parameters. Whether or not it would be important to adjust 
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Figure 4-1. Ronan Experiment G-1460 Actual vs. Model 
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Figure 4-2. Ronan Experiment G-1900 Actual vs. Model 
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Figure 4-3. Ronan Experiment G-2185 Actual vs. Model 
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Figure 4-4. Ronan Experiment G-2470 Actual vs. Model 
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the model to account for these factors, if indeed they could be identified, 
would be up to the discretion of the model user. One would have to consider 
the impact of making any calibrations, as it could corrupt the structure of the 
software project environment being modelled. 

As an example, within the software engineering curriculum, at the 
institution from which the subject students were used, the lesson of Brooks' 
Law was introduced, with some emphasis. Brooks' Law proclaims that 
adding more people to a late project only makes it later (Brooks 1975). The 
subjects knowledge of Brooks’ Law could provide an explanation for the 
WORKFORCE level decisions made in the early stages of each group's 
projects. As seen in Figures 4-1, 4-2, 4-3, and 4-4, each of the groups tended to 
add more people early in each of their projects. This could be deduced as a 
means of avoiding the crux of Brooks' Law. That is, by adding people early in 
the project, I won't have to worry about the dilemma of Brooks’ Law later on, 
because I can avoid the problem of a late project altogether. The figures also 
show that the model WORKFORCE decisions were well below those of the 
subjects during those initial stages. The impact of the students’ knowledge of 
Brooks' Law alone, could explain a large portion of the error between the 
model and the subjects. Of course there is no evidence to support this and it 
is intended only as an example. 

The existence of this error should not, however, degrade confidence 
in the model, provided that the modeler, or model user, recognizes the 
importance of the simulation environment. To a certain extent, it may be 
possible to calibrate the SDM gaming interface to more closely reflect the 


subjects environment, without disturbing the integrity of the software 
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development environment, which the model is designed to emulate. Of 
course, the extent of any calibration would depend greatly on the intended 


purpose of the experiment. 


C BAKER'S EXPERIMENT 


1. Experiment Description 

The design of Baker's experiment is essentially the same as Ronan's 
(Baker 1992). Graduate students were used as surrogate software project 
managers, they used the SDM gaming interface to input their decisions and to 
provide updated status reports on the software project, and the initial 
constraints of their software project were based on the DE-A project. 

In Baker's experiment there were two groups of subjects (Group A 
and Group B). Each group started off with the same initial conditions and the 
same objective, to complete the project as close as possible to the original 
estimates of schedule and cost. The difference between the two groups, was 
that Group A's project grew gradually in size from 320 tasks (one task equals 
approximately 50 lines of code) to 610 tasks, by day 100 of the project 
simulation. Group B's project size remained at 320 tasks through the 100th 
day of the project simulation, after the Day 80 status report (40 day decision 
making intervals), the subjects received a message on their screen that the 
project size had just been increased to 610 tasks, due to increased 
requirements. The project size then remained constant for the remainder of 
the project simulation for both groups. The subjects were required to input 
two decisions (staffing level and project cost estimate) at each simulated 40 


day interval. 


2. Baker Experiment Results and Analysis 
The input data tables for the actual results and the model results for 
each of the variables is given in Appendix E. Table 4-2 and 4-3 provide the 
RMSPE, MSE, Theil's Inequality Statistics, and TIC for each data set of Baker's 


Experiment. 


TABLE 4-2. ERROR ANALYSIS OF BAKER'S EXPERIMENT (GROUP A) 


INEQUALITY STATISTICS 
















The statistics presented in Tables 4-2 and 4-3 do not differ dramatically from 
those presented in Ronan's experiment. The TIC values for all of the 
variables, are all well below the .40 level, indicating a very good or excellent 
model. The RMSPE values for WORKFORCE are somewhat high, although 
not significantly higher than in Ronan's experiment. The RMSPE values for 
MAN-DAYS straddle the .10 level, indicating that the model structure as it 
relates to MAN-DAYS is probably sound. Additionally, the breakdown of the 
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inequality statistics does not clearly reveal errors which are unsystematic, as 
was the case in the Ronan experiment. A plot of the model generated 
decisions versus the student subject generated decisions for Group A 
WORKFORCE, Group A MAN-DAYS, Group B WORKFORCE, and Group B 
MAN-DAYS is displayed in Figures 4-5, 4-6, 4-7, and 4-8 respectively. 
Essentially the same discussion which was presented in the analysis of 
Ronan's experiment, in regards to the simulation environment and Brooks’ 
Law, is also applicable to Baker's experiment and will not be reiterated. 
Therefore, even though the size of the errors could be construed as being 
large (RMSPE's above 10%) and possibly systematic, the nature of these errors 
can be acceptable to a user of the model. As such, the scope of these errors 
does not necessarily degrade ones confidence in the model. 

The computations for the proposed alternative analysis measure, for 
Group A, WORKFORCE variable are presented in Table 4-4. The intent of 
this measure is to analyze the nature of the changes in the difference between 
the model and the experiment subjects, over a project's lifecycle for a given 
variable. This is only a proposed measure, however, and as such requires 
further analysis as to its suitability. It is presented here to serve as a basis for 


further research. The computation for each value at time t is as follows: 
2 
Vd (At = St) 
S 
As with the previous statistics, this measure was computed using the Quattro 
Pro 3 spreadsheet application. The actual input values and the spreadsheet 


documentation used to derive the values for the WORKFORCE variable of 


Group A, are given in Appendix F. 
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Figure 4-5. Baker Experiment Group A WORKFORCE Actual vs. Model 
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Figure 4-6. Baker Experiment Group A MAN-DAYS 
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Figure 4-7. Baker Experiment Group B WORKFORCE Actual vs. Model 
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Figure 4-8. Baker Experiment Group B MAN-DAYS Actual vs. Model 
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TABLE 4-4. COMPUTED VALUE AT TIME INTERVAL t 


o | 40 | oo | 120 | 10 | 200 | 240 | 250 
D. CONCLUSIONS 













The analysis of the test statistics for the direct experimentation method, is 
not as clear as it was in the case of comparison with actual project results. The 
results of the analysis reveal that further research would be prudent, as the 
results were somewhat mixed. The TIC values, on all counts, suggest that the 
model is an excellent one in terms of forecasting ability. The RMSPE and the 
breakdown of the inequality statistics, however, indicate that caution and a 
thorough understanding of the purpose and use of the model is essential. 
While this may seem intuitively obvious to some, irregardless of the 
validation results, it is still worth noting. Additionally, an alternative 
measure for analyzing the nature of the error differences, between the model 
and the experimental subjects, over a project's lifecycle, was presented. 
Further research is required to further explore the potential of this proposed 


measure. 
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V. CONCLUSIONS 


A. ACCOMPLISHMENTS 

Using several statistical measures to evaluate the goodness-of-fit of the 
Abdel-Hamid and Madnick model (Abdel-Hamid and Madnick 1991), this 
increases confidence in the validity and utility of the system dynamics model 
of software development by Abdel-Hamid and Madnick. Of course, how 
much confidence one has in the model depends greatly upon one's view of 
the model and its relationship reality. There are basically two potential users 
of the model. The first are software project managers. They would use the 
model to study the effects of varying management policies/decisions relevant 
to a given project's lifecycle. The second is academics. Their primary 
purposes for using the model would be to gain an understanding of the 
complexities of the software development process and as a teaching tool. 

Those users then, must have confidence in the model's ability to be a 
reasonably true reflection of reality if they are to make use of the model. If 
the model's goodness-of-fit is not adequate to suit their needs or expectations, 
then that alone may be reason enough to discard the model. The goal of this 
research effort was to complement the earlier validation steps with a battery 
of statistical measures. The approach taken to achieve this goal, was to 
incorporate general statistical measures given by others within the system 
dynamics field of study (TIC and RMSPE). Where a TIC below .40 equated toa 
very good or excellent model (Rowland and Holmes 1978) and an RMSPE less 


than .10 indicated that the model's structure was probably sound (Veit 1978). 
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The error between the real system and the model was then broken down into 
its sources in terms of bias, variance, and covariance using the Theil 
inequality statistics. From there, the type of error was determined. Was it an 
error which the model should reasonably be able to capture (systematic 
error)? Or, was it from some influence outside of the system being modeled 
(unsystematic error) and therefore, reasonable for the model to not capture 
the error? If determined that the error was systematic in nature, then the 
modeler may need to reexamine the estimation parameters. If the error was 
unsystematic, than the user must accept the assumption that it is not 
reasonable to expect the model to capture the exogenous influences. In the 
case of unsystematic errors, it is possible for the modeler to insert dummy 
variables into the model to create a closer goodness-of fit. However, doing so 
may upset the feedback structure inherent in a system dynamics model and 
the model would no longer be a reflection of the system it is attempting to 
emulate. This would only defeat the purpose of creating the model. 

This research also conducted a comparison of the model using direct 
experimentation. While this type of comparison is somewhat different than 
that of a comparison with actual project results, it provides a useful analysis 
tool. The experimentation analysis, while not as clear cut as the comparison 
with actual results, did not serve to undermine any confidence that one 
would vest in the model. The experimentation analysis also introduced 
another measure of analysis based on work done by Sterman (1987 and 1989). 
This measure has potential for use in future analysis of the model. 

The results of this research have demonstrated that the system dynamics 


model of software development by Abdel-Hamid and Madnick, displays a 
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reasonable and acceptable degree of goodness-of-fit. This study therefore, 
should serve to build confidence in the model's usefulness as a tool for 
providing a better understanding of the managerial aspects of the software 


development process. 


B. FUTURE DIRECTION 

The process of model validation should be evolutionary. Since the 
nature of a system dynamics model is dynamic, the confidence building 
process, or validation process, should also be dynamic in order to keep pace 
with the model. There is no single test which can and should be used for 
validation efforts. Rather, a multitude of varying tests should be 
incorporated to test the suitability and utility of a model for its given 
environment. As the model itself grows and adapts to its environment, so to 
should the testing process. 

There exists a multitude of directions that one could take for future 


testing of this model. Several are given here: 


(1) Collect results from other software projects that are suited to this 
particular model and conduct the same type of analysis as presented in 
this thesis. 


(2) Conduct further analysis of the direct experimentation method using 
the tests of significance presented by Sterman (1989). 


(3) Utilize the six-step behavior validation procedure preset by Barias 
(1989) on the DE-A and DE-B projects. 


(4) Utilize spectral analysis techniques to compare the DE-A and DE-B 
project results to the model results. 


(5) Conduct further research into the proposed analysis measure presented 
in Chapter IV to determine its suitability and applicability to system 
dynamics models. 
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APPENDIX A 


The following are the cell formulas, as computed in the Quattro Pro V. 3.0 
spreadsheet application. The formulas for the DE-A project only, are 
presented. The only difference between the DE-A project and the other 
computations is the data set. Otherwise, all other values and formulas 
remain the same for the other projects and experiments. 

Three spreadsheets were built (MSE and RMSPE computation 
spreadsheet, Theil's Inequality Statistics spreadsheet, and TIC spreadsheet). 
The MSE and RMSPE spreadsheet was the base spreadsheet (FORMMSE), 
where all of the initial data entry was made. The other two spreadsheets 
linked to the first spreadsheet with the link [FORMMSE] and the specific cell 


address. 


1. Computations for Mean Square Error (MSE) and Root Mean Square 
Percentage Error (RMSPE) 


meee (T) [W117] ‘PROJECT: 

Bl: (T) 'DE-A 

m2: (T) [W1l7] ‘Mean Square Error (MSE) of 
D2: (T) [W9] '‘MAN-DAYS 

Bes (T) (W1i5] ° Data 

more (TT) [Wi7] “t 

mee (rr) *St 

@o. (1) ([W/] “At 

D5: (T) [W9] *St-At 

meee (t) [W115] *(St—-At) **2 
F5: (T) [W9]) *St-At/At 

G5: (T) [W15] *(St-At/At) **2 
me: (T) [W117] 0 

Bio. {Ty ili 

wo: (1) [W7) 1111 

D6: (T) [W9] +B6-C6 

Bo: (Ty) [W115] +D6%*2 

F6: (T) [W9] +D6/C6 

Gos (1) [Wi5] +F6*2 
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(T) 
ay 
(T) 
(T) 
(T) 
on 
(T) 
(T) 
(T) 
(T) 
t) 
(T) 
(ee 
(T) 
1) 
1) 
(T) 
(T) 
a) 
a) 
(T) 
(T) 
(T) 
(T) 
ey, 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
a) 
cl} 
a) 
(T) 
(T) 


{[Ww17] 60 
ela 
iwi | de 
(W9]) +B7-C7 
(W15]) +D7%*%2 
(W9]) +D7/C7 
(W15] +F7%*2 
(W17] 80 
Tr Se 
[Wills 3G 
[w9] +B8-C8 
{[Ww15) +D8%2 
(W9) +D8/C8 
(W15] +F8%2 
(W17] 150 
122 5a 
[W7}) 1336 
{[w9] +B9-C9 
{[w1i5]) +D9%*2 
(W9]) +D9/C9 
(W15] +F9%2 
[W117] 240 
1428.3 
[Wi L582 
[W939] +B10-C10 
[Wilo)- +Dil0ez2 
[w9} +D10/C10 
(W15] +F10%2 
[w17] 280 
1461.5 
(W7] 1582 
(W929) +B11-Clil 
(W15) +D11%2 
(w9} +D11/Cl11 
(W15] +F11%2 
(W17] 300 
1637 55 
[Wy | elosw 
(W9] +B12-C12 
(W15]) +D12%*2 
[Ww9) +D12/C12 
(W15) +F12%°*2 
(W17] 340 
POS US 
(W7]) 1750 
(W939) +B13-C13 
(W15]) +D13%°2 
(W9]) +D13/C13 
(W15] +F13%2 
(W17] 360 
2029.9 


Cla ect) [W7) 1769 

pid (1): [WS] +B1l4—-Cl14 

Brae (rT) [Wi5) +D14%2 

F144: (T) [W9] +D14/C14 

G14: (T) [W15]) +F14%2 

A15: (T) [W17] 380 

Eaoceme) 2021.5 

Soo.) TW?) 2239 

mies tl) [WO] +B1lS-C15 

Pago (1) 6[W15)] +D15%2 

Pisce (f) [WI] +D1I5/C15 

eo, (1) [W15] +F15%2 

Al6: (T) [W17] @COUNT(A6. .Al15) 
peo. (1) [W15]) @CSUM(E6..E15) 
Src; (1) [W115] @SUM(G6..G15) 
Al8: (T) [W117] '‘"MSE= 

Bio. (T) +H16/Al6 

F18: (T) [W9] 'RMSPE= 

G18: (T) [W15] (G16/A16)%*0.5 


2. Computations for Theil's Inequality Statistics (UM, US, and US) 


mes (T) [W115] ‘PROJECT: 
Bl: (T) [W33]) 'DE-A 

Pease (tT) [W115] ‘THEIL Inequality Statistics for 
D2: (T) [W12] 'MAN-DAYS 
me: (T) *‘ Data 

mee (1) [W115] *(1) 

Be (T) §W33) “*(2) 

me: (7) [W115] %*(3) 

Moc (T) [Wl2] %*(4) 

moe ¢r) “~(5) 

mo. (T) [W380] * (6) 

Ao: (T) [W15]) +[FORMMSE)JA5 
B6: (T) [W33]) +[FORMMSE]B5 
C6: (T) [W15) +[FORMMSE]C5 
D6: (T) [W12]) “*St-S (mean) 
E6: (T) “At-A(mean) 

leo: (1T) [W30] *(4)* (5) 

A7: (T) [W15]) +[FORMMSE]A6 
B7: (T) [{W33) +[FORMMSE]B6 
C7: (T) [W15] +[FORMMSE]C6 
D7: (T) [W12]) +B7-SBS$19 
E7: (T) +C7-$C$19 

P7: (T) [{W30] +D7*E7 

A8: (T) [W15]) +[FORMMSE]A7 
B8: (T) [W33) +[FORMMSE]B/7 
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(T) 
(T) 
(T) 
(T) 
(T) 
us) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(1) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
cae 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 


[W15]) +[FORMMSE]C? 
[W12] +B8-SBS$19 
+C6—-SEsi9 

[W30] +D8*E8 

[W115] +[FORMMSE]A8 
[W33] +[FORMMSE]B8 
[W15] +[FORMMSE]C8 
[W112] +B9-$BS$19 
+C9-SCS19 

[W30] +D9*E9 


[W15]) +[FORMMSE]A9 
[W33] +[FORMMSE]B9 
[W15] +[FORMMSE]C9 
[W12] +B10-$B$19 
+C10-$C$19 

[W30] +D10*E10 
[W15} +[FORMMSE]A10 
(W33] +[FORMMSE]B10 
[W15] +[FORMMSE]C10 
[W12]} +B11-S$BS19 
+C11-$C$19 

[w30]) +D11*E11 
[W15] +[FORMMSE]A11 
[W33] +[FORMMSE]Bl11 
[(W15] +[FORMMSE]C11 
[W12]} +B12-S$BS19 
+C12-$C$19 

[W30] +D12*E12 
[W15] +[FORMMSE]A12 
[W33] +[FORMMSE]B12 
[W15]) +[FORMMSE]C12 
[W12] +B13-$BS19 
+l se ceoi9 

(W30] +D13*E13 
[(W15] +[FORMMSE]A13 
[W33] +[FORMMSE]B13 
[W15] +{[FORMMSE]C13 
[W12] +B14-SBS19 
+C14-S$CS19 

[W30]) +D14*E14 
[W15] +[FORMMSE]A14 
[W33] +[FORMMSE]B14 
[W15]) +[FORMMSE]C14 
[W12] +B15-$BS$19 
tol — Seo 40 

[W300] +D15*E15 
(W15] +[FORMMSE]A15 
[W33] +[FORMMSE]B15 
[W15] +[{FORMMSE]C15 
[W12] +B16-SBS19 
TCI G=sCong 
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Baa 
Bley: 
oy < 
Al9: 
B19: 
Ceo): 
B19: 
E19: 
A20: 
B20: 
e220 : 
eS: 
Boo: 
A24: 
B24: 
B25 3 
B25: 
A26: 
B26: 


(7 
(T) 
ae, 
(T) 
(T) 
(T) 
cr} 
(T) 
cl) 
(T) 
(T) 
(T) 
oe) 
(T) 
(T) 
(T) 
(2) 
(T) 
(T) 


[W30] 
{w15] 
[W30] 
{(W15] 
{W33] 
[W15] 
i 

[W30] 
{W15] 
[W33] 
{Ww15] 
[w15] 
[W33] 
{(w15] 
{[W33] 
[W15] 
[W33] 
[Ww15] 
{[W33] 


+S eG 

+ [FORMMSE]A16 
@SUM(F7..F16) 
“"Mean= 
@AVG(B7..B16) 
@AVG(C7..C16) 


Cy yt 7 (B20) * (CZ0)) 
"Std Dev= 

@STD(B7..B16) 

CSib ier eG) 

et UM= 
((B19-C19) %*2) / ( [FORMMSE]B18) 
"USs= 
((B20-C20) *2) / [FORMMSE]B18 

wt UC= 

(2* (1-F19) *B20*C20) / [FORMMSE]B18 
"Total= 

@SUM(B23..B25) 


Computations for Theil's Inequality Coefficient (TIC) 


(T) 
(T) 
a) 
(T) 
ae) 
(T) 
(T) 
on) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
on) 
(T) 
(T) 
ee 
(2) 
(T) 
(T) 
(T) 
(2) 
Gr) 
(T) 
ce) 


[W411] 
[wW53] 
[W411] 
[W411] 
[W115] 
[W411] 
[W53] 
[w41] 
{[Ww41] 
[Ww15] 
{Ww15] 
[w41] 
[W53] 
[Ww41] 
{W411 ] 
{W15] 
[w15] 
[w41] 
{[W53] 
{w41] 
[W411] 
{[Ww15] 
{Ww15] 
(w41] 
[W53] 
[W411] 


wen @ rea: 
~DB—A 
“Wibibeinequality Ceoerirerent for 
*"MAN-DAYS 

" Data 

oo) 

CZ) 

FS ) 

“(4) 

G5) 

5) 

+ [FORMMSE]A5 
+ [FORMMSE]B5 
+ [FORMMSE]C5 
“(St-At) **2 
SSE. AZ 

“AC **2 

+ [FORMMSE] A6 
+ [FORMMSE] B6 
+ [FORMMSE]C6 
+ [FORMMSE]E6 
+B7*2 

ty ae 

+ [{FORMMSE]A7 
+ [FORMMSE]B/7 
+ [FORMMSE]C7 
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(T) 
(T) 
(T) 
cr) 
(T) 
(T) 
(T) 
(1) 
(T) 
<r) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
oly) 
(T) 
(T) 
(T) 
wa) 
(T) 
(T) 
CT} 
(2) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(T) 
(a) 
(T) 
(T) 
(T) 
(T) 


[W411] 
[Wis 
[W15] 
[W41] 
[W53] 
[W411] 
[W411] 
(W15] 
[W15] 


+ [FORMMSE]E7 
+B8°2 
tect Z 
+ [FORMMSE]A8 
+ [FORMMSE]B8 
+ [FORMMSE]C8 
+ [FORMMSE]E8 
+B972Z 
te ar 


[W411] 
[W53] 
(W41] 
[W411] 
[W15] 
[W15] 
[W41] 
[W53] 
[W41] 
(W41] 
[W15] 
[W15] 
[W411] 
[W53] 
[W41 ] 
(W41] 
[W15] 
[W15] 
[W411] 
[W53] 
[W411 } 
(W411 } 
[W1i5] 
(W15] 
(W41] 
[W53 ] 
[W411] 
[W41 ] 
[W15] 
[Wi5] 
{W41 ] 
[W53] 
[W411 ] 
[W41] 
[W15] 
[W15] 
(W41] 
(W53] 
[W411 } 
[W41] 
(W15] 
[W15] 


+ [FORMMSE]A9 
+[FORMMSE]B9 
+[FORMMSE]C9 
+ [FORMMSEJE9 
+B OleZ 

+C10%2 

+ [FORMMSE] A10 
+ [FORMMSE]B10 
+ {FORMMSE]C10 
+ [FORMMSE]E10 
+B11%*2 

Fel) °Z 
+[FORMMSE]A11 
+ [FORMMSE]B11 
+[FORMMSE)]C11 
+[FORMMSE)]E11 
+B12%2 

+C€12%2 

+ [FORMMSE]A12 
+ [{FORMMSE]B12 
+ [FORMMSEJC12 
+ [FORMMSE]E12 
+B13%2 

eds 2 

+ [FORMMSE]A13 
+ {FORMMSE]B13 
+ {FORMMSEJC13 
+[{FORMMSE]JE13 
+B14%2 

+€14%2 
+[{FORMMSE]A14 
+[{FORMMSE]B14 
+[{FORMMSE]C14 
+[FORMMSE]E14 
+Bi5o2 

+15 a2 

+ [FORMMSE]A15 
+ [FORMMSE]B15 
+ [FORMMSEJC15 
+ [FORMMSEJE15 
+B16%*2 

=O10 32 
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Al7: (T) [W41]) +[FORMMSE]A16 

Pine sr)  (W4il}] +f[FORMMSE)JE16 

Bebye ¢t)> {[W1S) CSUM(E7 Eile) 

Eee (1) (W15)] GSUM(P7. FI6) 

Al9: (T) [W41) “U= 

Bige(1T) ([W53) 

eo el yO. 5) / (4 te / Al?) 90.5) +4 (4+F17/A17)*0.5)) 
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APPENDIX B 


Input data files for DE-A project, SCHEDULE variable, MAN-DAYS 
variable, and WORKFORCE variable at time t. Where: 


S; = Simulated (model) value at time t 


A; = Actual value at time t 


SCHEDULE DATA TABLE 


oo a 
| 80 | 105.5 | 1336 
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WORKFORCE DATA TABLE 
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APPENDIX C 


Input data files for DE-B project, SCHEDULE variable, MAN-DAYS 
variable, and WORKFORCE variable at time t. Where: 


S; = Simulated (model) value at time ft 
A; = Actual value at time t 


SCHEDULE DATA TABLE 





MAN-DAYS DATA TABLE 


poo | a 
_ 6 | mPa 
p80 | miss | 1336 
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WORKFORCE DATA TABLE 
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APPENDIX D 


Input data files for Ronan's Experiment, WORKFORCE variable at time ¢ for 
G-1460, G-1900, G-2185, and G-2470. Where: 


S: = Simulated (model) value at time t 
A; = Actual value at time f 


G-1460 DATA TABLE 





G-1900 DATA TABLE 
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G-2185 DATA TABLE 
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G-2470 DATA TABLE 
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APPENDIX E 


Input data files for Baker's Experiment, WORKFORCE variable and 
MAN-DAYS variable at time t, for Group A and Group B. Where: 


St = Simulated (model) value at time t 
A; = Actual value at time f 


GROUP A WORKFORCE DATA TABLE 





GROUP A MAN-DAYS DATA TABLE 


a SO So Lt 
| 8061.9 | 1140 













1539.1 1600 
1677.8 1583 
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GROUP B WORKFORCE DATA TABLE 


At 


1624.4 





oY 


APPENDIX F 


The following listings are the input data tables and the cell formula 
documentation, for the WORKFORCE variable of Baker's Experiment, for the 
proposed measure. The input data table used to compute the measure is 
presented first, followed by the cell formula computations. 


1. Input data table for the student data (ACTUAL) and the Model data for 
time t of the project's lifecycle. 


GROUP A WORKFORCE 
ACTUAL AND MODEL VALUES FOR TIME T 


[NAME __|_0 | 40 | 80 | 120 | 160 | 200 | 240 | 280 | 320° 
BELL __—-+{5 {5 [53,55 |55 [56] 62 | 75] 
BITINER [5 [5 [6 |o [6 |7 [7 [6 | 
(BRANLEY | 5 | 596] 65 | 77 | 77 |l02|102 [9 | 
(CHELOUCHE [5 [4 ]4 13 [3 [3 [4 [8 |7_ 
(CULPEPPER [5 [5 [5 |7 17 [7 [4 [5 | 
FEY {5 [5 [5 |48|56,8 |5 |i | 
HODGKINS [5 [55] 55/6 | 65 |o5[68 | 68] 
WEY. {5 [6 |6 |e {oe [5 1/4 [4] 
ACO [5 [55] 55|49 | 49 |48[48 | 48] 






















fe 
NJ 
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Bel. 
Bai: 
Gil; 
Pads: 
ey: 
wii: 
AZ : 
Bye: 
Gal : 
bi2: 
Biz: 
Fez : 
G12: 
Pez: 
eZ: 
eli 
Pars : 
pis: 
ens : 
Dis: 
Elo: 
PS : 
G13: 
PS 2 
a3: 
w13% 


Each line of the documentation represents a separate cell block address, 
with the respective contents of that given cell. For example, in the first 
listing the cell block address Al: contains the text "GROUP A 
WORKFORCE". Linkage between this spreadsheet and the data input 
spreadsheet is made with file links. For example, the first link is listed 
at cell address B11, where ((AWFMOD]B5-[AWFACT]B5) represents the 
formula (A;—S;)2 for the student named BELL at t = 0. 


(W14] "GROUP A WORKFORCE 
(W14]) ‘Computation for the square root of the sum of squared errors 
(W14]) ‘for time t 
(W6]) '(At-St)*2 Table 

(W6]) 0 

(w6] 40 

(W7] 80 

{(W7}) 120 

[(W6) 160 

[W6] 200 

(W6] 240 

[W6] 280 

(W6) 320 

(W14] ‘BELL 

(F2) [(W6] ([AWFMOD] B5-[AWFACT] BS) *2 
(F2) [W6) (([AWFMOD] C5-[AWFACT]C5)%*2 
(F2) [(W7]) ([AWFMOD]D5-[AWFACT]D5)%*2 
(F2) [(W7]) ([AWFMOD] E5-[AWFACT]E5) *2 
(F2) [W6) ([AWFMOD]) F5-[(AWFACT]F5)%*2 
(F2) [W6] ([AWFMOD]G5- [AWFACT]G5)%*2 
(F2) [W6]) ([AWFMOD] H5-[AWFACT]HS5)%*2 
(F2) [W6) ([AWFMOD] I5-[AWFACT]1I5)%*2 
(F2) [W6] ([AWFMOD] J5-[AWFACT] J5)%2 
(W14) 'BITTNER 

(F2) [W6]) ([AWFMOD] B6-[AWFACT] B6) *2 
(F2) [W6]) ([AWFMOD]C6-[AWFACT] C6) *2 
(F2) (W7] ([AWFMOD] D6-[AWFACT] D6) *2 
(F2) [W7] ([{AWFMOD] E6-[AWFACT] E6) *2 
(F2) (W6) ([AWFMOD] F6-[AWFACT] F6) *2 
(F2) [(W6) ([AWFMOD) G6-[AWFACT] G6) *2 
(F2) [(W6]) ([AWFMOD]) H6-[AWFACT] H6) *2 
(F2) [(W6) ([AWFMOD] I6- [AWFACT]16)%*2 
(F2) [(W6) ([AWFMOD] J6-[(AWFACT] J6) *2 
(W14]) 'BRANLEY 

(F2) [(W6]) ([AWFMOD] B7- [AWFACT] B7) *2 
(F2) [W6) ([AWFMOD]C7-[AWFACT]C7) *2 
(F2) [(W7]) ([AWFMOD]D7-[AWFACT]D7)%*2 
(F2) (W7]) ((AWFMOD] E7-[AWFACT]E7) *2 
(F2) [W6] ([(AWFMOD] F7-[AWFACT]F7) *2 
(F2) [(W6] ([AWFMOD] G7- [AWFACT] G7) *2 
(F2) [(W6]) ([AWFMOD] H7-[AWFACT] H7)%*2 
(F2) [(W6] ([AWFMOD)1I7-[AWFACT]17)%*2 
(F2) [W6) ([(AWFMOD) J7-[AWFACT] J7)%*2 
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Alias 
Bld: 
C14: 
D14: 
E14: 
F14: 
G14: 
H14: 
I14: 
J14: 
Al5: 
Bio: 
CS 
Dis: 
E15? 
Fi5- 
G15: 
Hello: 
iS. 
Ji S: 
PG: 
BG: 
Giles: 
Dice 
EloG: 
FiG: 
G16: 
H16: 
IGisye 
JiG: 
Al? 
B17: 
Cil7: 
Dia: 
El? 
Pi: 
G17: 
H17: 
Lys 
O17: 
Pages 
B18: 
Cicer: 
Dike 
E18: 
ELS: 
Gis: 
HLG 
Tis: 
J18: 
Al9: 
B19: 
C19 
Dug: 
EYVS: 


(wW14] 


(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 


(W14) 


(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 


{W14)] 


(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 


(W14) 


(F2) 
(F2) 
(EZ)) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 


(W14] 


(F2) 
(F2) 
(EZ 
(3), 
(F2) 
(F2) 
(F2) 
(F2) 
(F2) 


[W14] 


(Ez) 
(EZ) 
(F2) 
(F2) 


*CHELOUCHE 


(W6 ] 
[W6] 
(W7] 
[W7] 
(W6] 
(W6] 
[W6] 
[W6] 
(W6] 
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